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Recently,  ICASE  has  begun  differentiating  between  reports  with  a  mathemat¬ 
ical  or  applied  science  theme  and  reports  whose  main  emphasis  is  some  aspect  of 
computer  science  by  producing  the  computer  science  reports  with  a  yellow  cover. 
The  blue  cover  reports  will  now  emphasize  mathematical  research.  In  all  other 
aspects  the  reports  will  remain  the  same;  in  particular,  they  will  continue  to  be 
submitted  to  the  appropriate  journals  or  conferences  for  formal  publication. 
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The  Use  of  Lanczos’s  Method  to  Solve  the 
Large  Generalized  Symmetric  Definite 
Eigenvalue  Problem 

Mark  T.  Jones*and  Merrell  L.  Patrick*^ 


j  Abstract  , 

/ 

The  generalized  eigenvalue  problem,  Kx  =  XMx,  is  of  signifi¬ 
cant  practical  importance,  especially  in  structural  engineering  where 
it  arises  as  the  vibration  and  buckling  problems.  A  new  algorithm, 
LANZ,  based  on  Lanczos’s  I'lethod  is  developed.  LANZ  uses  a  tech¬ 
nique  called  dynamic  shifting  to  improve  the  efficiency  and  reliability 
of  ‘he  Lanczos  algorithm.  A  new  algorithm  for  solving  the  tridiagonal 
matrices  that  arise  when  using  Lanczos’s  method  is  described.  A  mod¬ 
ification  of  Parlett  and  Scott’s  selective  orthogonalization  algorithm 
is  proposed.  Results  from  an  implementation  of  LANZ  on  a  Convex 
C-220  show  it  to  be  superior  to  a  subspace  iteration  code. 
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1  Introduction 

The  solution  of  the  symmetric  generalized  eigenvalue  problem, 


Kx  =  XMx,  (1) 

where  K  and  M  are  real,  symmetric  matrices,  and  either  K  or  M  is  positive 
semi-definite,  is  of  significant  practical  importance,  especially  in  structural 
engineering  as  the  vibration  problem  and  the  buckling  problem  (BH87).  The 
matrices  K  and  M  are  either  banded  or  sparse.  Usually  p  <<  n  of  the  small¬ 
est  eigenvalues  of  Equation  1  are  sought,  where  n  is  the  order  of  the  system. 
The  method  of  Lanezos,  suitably  altered  for  the  generalized  eigenvalue  prob¬ 
lem,  is  shown  to  be  useful  for  the  efficient  solution  of  Equation  1.  [Lan50) 
[NOPEJ87]. 

A  sophisticated  algorithm,  based  on  the  sample  Lanez*^  ju 

developed  in  this  paper.  The  algorithm,  called  LANZ,  has  been  imple¬ 
mented  on  supercomputer  architectures  at  NASA  Lcingley  Research  Center 
and  results  from  the  implementation  are  discussed.  Two  applications  from 
structural  engineering  are  described  in  Section  2.  The  properties  of  the  gener¬ 
alized  eigenvalue  problem  and  solution  methods  are  given  in  Section  3.  The 
simple  Lanezos  algorithm  is  presented  in  Section  4.  The  use  of  Lanezos’s 
method  for  the  generalized  eigenvalue  problem  and  the  LANZ  algorithm 
are  discussed  in  Section  5.  An  execution  time  cost  analysis  of  LANZ  is 
developed  in  Section  6.  The  solution  of  the  tridiagonal  matrices  that  arise 
when  using  Lanezos’s  method  is  considered  in  Section  7.  Methods  for  solving 
the  symmetric  indefinite  linear  systems  that  arise  in  LANZ  are  described  in 
Section  8.  In  Section  9,  the  performance  of  the  LANZ  algorithm  is  analyzed 
and  compared  to  the  performance  of  subspace  iteration,  the  most  prevalent 
method  for  solving  this  class  of  problems  in  structural  engineering. 
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2  Problems  of  Interest 

Two  important  applications  of  LANZ  which  arise  in  structural  engineering 
are  the  vibration  problem  and  the  buckling  problem.  Practical  problems 
from  these  applications  will  be  used  to  test  the  performance  of  the  program. 

In  the  free  vibration  problem,  the  vibration  frequencies,  w,  and  mode 
shape  vectors,  x,  of  a  structure  are  sought.  The  equation 

Kx  =  uPMx,  (2) 

is  solved,  where  K  is  the  positive  definite  stiffness  matrix  and  M  is  the  semi¬ 
positive  definite  mass  matrix.  The  mass  matrix,  M,  can  be  either  diagonal 
(in  which  case  it  is  referred  to  as  a  diagonal  mass  matrix)  or  can  have  approx¬ 
imately  the  same  structure  as  K  (in  which  case  it  is  referred  to  as  a  consistent 
mass  matrix).  Because  K  is  positive  definite  and  M  is  semi-positive  definite, 
the  eigenvalues  of  2  are  non-negative.  When  a  dynamic  load  is  applied  to  a 
structure,  the  structure  begins  to  vibrate.  If  the  vibration  frequency  is  close 
to  a  natural  frequency  of  the  structure,  w,  then  the  resulting  resonance  can 
lead  to  structural  failure.  Engineers  want  to  ensure  that  a  structure  has  no 
natural  frequencies  near  the  range  of  frequencies  of  expected  dynamic  loads. 
Engineers  are  often  interested  in  a  fc'.v  of  the  lowest  frequencies  of  the  struc¬ 
ture  to  ensure  that  these  natural  frequencies  are  well  above  the  frequencies 
of  expected  dynamic  loads.  [Jon89].  In  some  situations,  all  the  frequencies 
in  a  particular  range  may  be  sought.  [Kni89]. 

In  the  buckling  problem,  the  smallest  load  at  which  a  structure  will  buckle 
is  sought.  These  buckling  loads  of  a  structure.  A,  are  the  eigenvalues  of  the 
system, 

Kx  =  —XKcx,  (3) 

v>here  K  is  the  positive  definite  stiffness  matrix,  Kc  is  the  geometric  stiffness 
matrix,  and  i  is  the  buckling  mode  shape  vector  [Mah87].  The  Kq  matrix 
has  approximately  the  same  structure  as  K  and  can  be  indefinite  and  singu¬ 
lar.  Because  Kq  is  indefinite,  the  eigenvalues  of  Equation  3  can  be  negative, 
although  in  most  practical  problems  they  are  positive.  Equation  3  can  have 
up  to  n  distinct  eigenvalues;  although  only  the  smallest  eigenvalue  is  of  phys¬ 
ical  importance  since  any  load  greater  than  the  smallest  buckling  load  will 
cause  buckling  [Jon89].  Designers  may,  however,  be  interested  in  the  six  to 
ten  lowest  eigenvalues  in  order  to  observe  the  spacing  of  the  buckling  loads. 
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If  the  lowest  buckling  loads  are  great'y  separated  from  other  bnrkling  loads, 
the  designer  may  seek  to  change  the  structure  in  order  to  cause  the  lowest 
buckling  loads  to  become  closer  to  the  other  buckling  loads  of  the  structure. 
[Kni89].  Because  the  Kq  rnatrix  can  be  indefinite  and  singular,  the  buckling 
problem  is  more  difficult  to  solve  numerically  than  the  vibration  problem. 
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3  The  Generalized  Eigenvalue  Problem 
3.1  Properties 

Equation  1  yields  n  eigenvalues  if  and  only  if  the  rank  of  M  is  equal  to  n, 
which  may  not  be  the  case  in  the  problems  under  consideration  [GL83].  In 
fact,  if  M  is  rank  deficient,  then  the  number  of  eigenvalues  of  Equation  1 
may  be  finite,  empty,  or  infinite  [GL83].  In  practical  problems,  however,  the 
set  of  eigenvalues  is  finite  and  non-empty.  When  K  and  M  are  symmetric 
and  one  or  both  of  them  is  positive  definite,  the  eigenvalues  of  Equation  1 
are  real  (without  sacrificing  generality,  it  will  be  assumed  that  M  is  positive 


definite).  An  orthogonal  matrix,  R,  exists  such  that 

R'^MR  =  diag{^^)  =  (4) 

where  the  are  the  eigenvalues  of  M  and  are  therefore  positive  real.  If  M 
is  expanded  to  RD^R^^  then  Equation  1  becomes 

(K  -  \M)x  =  {K  -  XRDDR'^)x.  (5) 

Equation  5  can  be  rearranged  to  become 

{K  -  \M)x  =  {RD{D-'^R^KRD-^  -  XI)DR^)x.  (6) 

Taking  the  determinant  of  each  side  yields 

dei{K  -  AM)  =  {det{R)f{dei[D)fdet{P  -  XI),  (7) 


where  P  =  D~^RFKRD~^  .  Because  R  is  orthogonal  dei{R)  =  1.  Dei{D)  is 
the  product  of  the  eigenvalues  of  M,  which  are  all  positive  real.  Therefore, 
the  roots  of  det{K  —  AM)  are  those  of  det{P  —  XI).  Because  P  is  a  symmetric 
matrix,  its  eigenvalues  are  all  real  and  therefore  those  of  Equation  1  are  all 
real  [Wil65]. 

When  K  is  positive  definite  and  M  is  positive  semi-definite,  as  they  are  in 
the  vibration  problem,  the  eigenvalues  of  Equation  1  are  all  non-negative.  To 
show  that  the  eigenvalues  are  non-negative,  multiply  each  side  of  Equation  1 
by  x'^  to  get 

x'^Kx  =  Ax^Mz.  (8) 
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Because  K  is  positive  definite,  x^Kx  is  positive,  and  because  A/  is  posi¬ 
tive  semi-definite,  x^Mx  is  non-negative;  therefore,  A  must  be  non-negative 
[Wil65]. 

The  eigenvectors  of  Equation  1  satisfy  M-orthogonality  (i^A/y  =  0,  if  i 
and  y  are  M-orthogonal).  The  matrix,  P,  has  a  complete  set  of  eigenvectors, 


Zi,  and  the  same  eigenvalues  as  Equation  1.  Thus, 

Pzi  =  XiZi,  (9) 

and, 

D-^R^KRD'^Zi  =  A.r..  (10) 

Then,  the  sequence  of  transformations  in  Equation  11  show  that  x  =  RD~^  z, 

KRD-^Zi  =  XiRDzi  =  XiRD{DR^RD-^)zi  =  X,M{RD-^z,)  (11) 

Because  the  2,  are  known  to  be  orthogonal, 

0  =  zjzj  =  {DR'^XifDR^Xj  =  XiMxj.  (12) 


The  methods  and  algorithms  discussed  in  this  paper  rely  on  the  eigenvalues 
being  real  and  the  eigenvectors  being  M-orthogonal.  In  addition,  LANZ 
makes  use  of  the  property  that  the  eigenvalues  in  the  vibration  problem  are 
non- negative. 

3.2  Solution  Methods 

Several  methods  for  solving  the  large  symmetric  generalized  eigenvalue  prob¬ 
lem  exist,  A  popular  method,  especially  in  structural  engineering,  is  sub¬ 
space  iteration  [Bat82].  Nour-Omid,  Parlett  and  Taylor  provide  convincing 
theoretical  and  experimental  evidence  that  Lanczos’s  method  is  superior  to 
subspace  iteration  on  a  sequential  machine  [NOPT83].  The  parallelization  of 
subspace  iteration  was  investigated  by  Mahajan  and  it  remains  an  open  ques¬ 
tion  as  to  whether  Lanczos’s  method  will  be  superior  to  subspace  iteration 
on  parallel  machines  [Mah87].  Another  solution  method  for  the  generalized 
eigenvalue  problem  is  a  two-level  iterative  method  proposed  by  Szyld  [Szy83]. 
He  uses  a  combination  of  the  inverse  iteration  and  Rayleigh  quotient  meth¬ 
ods  at  the  outer  level,  and  an  iterative  solver  for  indefinite  systems  at  the 
inner  level.  Szyld  assumes,  however,  that  M  is  non-singular,  which  is  not 
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always  the  case  for  the  applications  under  examination.  A  promising  method 
for  parallel  machines  based  on  the  Sturm  sequence  property  is  proposed  by 
Ma  [Ma88].  He  uses  multi-sectioning  or  bi-sectioning  to  obtain  the  eigen¬ 
values,  and  then  uses  inverse  iteration  to  recover  the  eigenvectors.  Again, 
the  assumption  is  made  that  M  is  non-singular.  Schwarz  proposes  a  method 
which  minimizes  the  Rayleigh  quotient  by  means  of  the  conjugate  gradient 
method.  The  method  uses  a  partial  Choleski  decomposition  as  a  precon¬ 
ditioner  to  speed  convergence  [Sch89].  SOR-based  methodr.  have  ^dso  been 
proposed  for  the  generalized  eigenvalue  problem,  but  these  suffer  from  two 
flaws;  1)  the  methods  have  difficulty  when  the  eigenvalues  are  clustered,  and 
2)  the  methods  require  that  M  be  positive  definite  [Ruh74].  Other  meth¬ 
ods  have  also  been  proposed  [SW82)  (Sch74).  Block  Lanezos  methods  have 
been  developed  but  are  more  complicated  than  the  simple  Lanezos  process. 
Block  methods  are  limited  in  that  the  user  of  the  method  must  choose  the 
size  of  the  blocks  [NOC85].  One  significant  advantage  of  the  block  methods 
is  that  they  can  easily  reveal  the  presence  of  multiple  eigenvalues  [GU77]. 
To  the  best  of  the  authors’  knowledge,  no  satisfactory  method  of  choosing 
this  block  size  to  best  determine  the  multiplicity  of  eigenvalues  has  been 
proposed.  The  block  size  is  usually  chosen  to  take  advantage  of  a  particular 
computing  architecture  [Ruh89]. 
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4  Lanczos’s  Method 


4.1  Execution  in  Exact  Arithmetic 

In  order  to  understand  Lanczos’s  method  when  applied  to  the  generalized 
eigenvalue  problem,  it  is  first  necessary  to  examine  the  method  when  applied 
to 

Ax  =  Xx,  (13) 

where  >1  is  an  n  x  n  real  symmetric  matrix.  In  Lanczos’s  method,  the  matrix 
A  is  transformed  into  a  tridiagonal  matrix,  T,  in  n  steps  in  exact  arithmetic. 
However,  roundoff  errors  make  the  simple  use  of  Lanczos’s  method  as  a  di¬ 
rect  method  for  tridiagonalizing  A  impractical  [Sim84].  Paige  suggests  using 
Lanczos’s  method  as  an  iterative  method  for  finding  the  extremal  eigenvalues 
of  a  matrix  [Pai7l].  At  each  step,  j,  of  the  Lanczos  algorithm,  a  tridiagonal 
matrix,  Tj,  is  generated.  The  extremal  eigenvalues  of  Tj  approximate  those 
of  A,  and  as  j  grows,  the  approximations  become  increasingly  good  [GL83]. 
Both  Kaniel  and  Saad  have  examined  the  convergence  properties  of  Lanczos 
in  exact  arithmetic  [Kan66]  [SaaSO].  The  convergence  results  that  they  de¬ 
rive  imply  that  the  speed  of  convergence  of  the  eigenvalues  of  the  T  matrices 
to  eigenvalues  of  A  depends  on  the  distribution  of  the  eigenvalues  of  A\  if  an 
extreme  eigenvalue  of  A  is  well  separated  from  its  neighbors,  convergence  to 
this  eigenvalue  will  be  fast.  However,  clustered  eigenvalues,  or  eigenvalues 
that  are  in  the  middle  of  the  spectrum  of  A,  will  not  be  converged  to  as 
quickly  as  the  extremal  eigenvalues. 

In  Lanczos’s  method,  a  series  of  orthonormal  vectors,  . .  .qn,  is  gener¬ 
ated  which  satisfy: 

T  =  Q^AQ,  (14) 

where  the  vectors  g,  are  the  columns  of  Q.  If  each  side  of  Equation  14  is 
multiplied  by  Q  to  yield, 

QT  =  AQ,  (15) 

and  the  columns  on  each  side  are  set  equal,  then 

=  Aqj.  (16) 

where  the  a’s  and  /?’s  are  the  diagonal  and  subdiagonal,  respectively,  of  T, 
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the  tridiagonal  matrix  in  Equation  17  (GL83]. 


ai  ^2 

02  012  03 

03  Ol3  04 

0j-l  Qj_i  0j 

02  ^3 

Equation  16  can  be  rearranged  to  become  the  three-term  recurrence  relation 

0i+iqj+i  =  Mi  ~  qjoij  -  <ij-i0j-  (18) 

From  this  recurrence  relation  at  step  j: 

AQi  =  QiTi^r,e],  (19) 

where  rj  =  0j+i  and  Cj  is  the  ^'th  column  of  the  j  x  j  identity  ma¬ 
trix  [GL83].  These  q  vectors,  also  called  Lanczos  vectors,  are  the  key  to  the 
algorithm.  They  are  generated  as  shown  in  Equation  18.  That  this  recur¬ 
rence  relation  generates  a  set  of  Lanczos  vectors,  qi . . .  gj,  which  belong  to  the 
Krylov  subspace,  k{A,  qi,j),  can  be  shown  by  construction.  That  these  Lanc¬ 
zos  vectors  are  also  orthonormal,  and  therefore  span  the  Krylov  subspace, 
x(y4,gi,j),  is  shown  by  induction  in  [GL83].  If  the  Lanczos  vectors  are  not 
orthonormal,  then  this  three- term  recurrence  relation  does  not  generate  the 
correct  T  matrix. 

Two  of  the  major  benefits  of  the  simple  Lanczos  algorithm  are:  1)  the 
structure  of  the  matrix  A  is  not  important  for  the  Lanczos  algorithm;  the  only 
access  to  A  that  the  algorithm  requires  is  a  routine  that  returns  the  product 
of  A  times  a  vector,  and  2)  at  step  j,  only  two  of  the  Lanczos  vectors, 
and  qj,  are  required  to  be  stored  in  core  memory,  the  rest  can  be  stored  in 
secondary  memory  until  needed  for  the  computation  of  the  eigenvectors.  The 
simple  Lanczos  algorithm  is  shown  in  Figure  1  [GL83]. 

The  tridiagonal  eigenvalue  problem  generated  by  Lanczos’s  method  can 
be  written  in  the  form 

TjS  =  0s,  (20) 

w’here  the  0’s  are  sometimes  called  Ritz  values.  The  eigenvalues  of  Tj  will 
approximate  those  of  A^  especially  the  extremal  ones.  Approximations  to  the 
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To  =  starting  vtctor 

=  II  ro  II 

9o  =  0 

j  =  1 

while  ^  0) 

9j  = 

Tj  =  Aqj  -  Q^q,  -  0jqj^i 

=  II  r,  (I 

j  =  j  +  1 


Figure  1:  The  simple  Lanczos  algorithm 

eigenvectors  of  A,  called  Ritz  vectors,  can  be  calculated  using  the  equation 

yi  =  Q,Si,  (21) 

where  yi  is  the  Ritz  vector  couesponding  to  Si  [PN085].  These  Ritz  vectors 
satisfy:  [NOPEJ87] 

Avi  =  yA  +  rj3,{j).  (22) 

4.2  Effect  of  Roundoff  Errors 

The  algorithm  and  equations  in  Subsection  4.1  hold  only  in  exact  arithmetic; 
they  degenerate  in  the  presence  of  roundoff  errors,  and,  therefore,  the  Lanczos 
vectors  can  no  longer  be  eissumed  to  be  orthonormal.  When  roundoff  errors 
are  taken  into  account,  Equation  19  becomes 

AQj^Q:Tj  +  r,cJ  F„  (23) 

where  Fj  is  used  to  represent  roundoff  error.  Then, 


II  Ayi  -  yA  II  <  AiA  ||  F,  ||,  (24) 

can  be  used  to  bound  the  error  in  the  Ritz  pair  (0,,  j/,).  The  norm  of  F  can 
be  bounded  by  jj  ^4  ji.  where  e  is  a  constant  based  on  the  floating  point 
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precision  of  the  computer,  and  j3j:  =  ySj+i  |  Si{j)  j  [PN085].  The  bound  on 
]|  F{j)  II  is  small  and  can  be  disregarded.  The  most  important  factor  then 
becomes  From  [Pai7l] 

I  1=  (25) 

where  7  is  a  roundoff  term  approximately  equal  to  e  ||  j4  ||.  From  Equation  25, 
the  conclusion  can  be  drawn  that  cis  becomes  small  (and  hence  the  error 
in  the  Ritz  pair,  (0i,y,),  also  becomes  small),  the  q  vectors  lose  orthogonality 
[ParSO].  Equation  25  implies  that  convergence  of  a  Ritz  pair  to  an  eigenpair 
of  A  results  in  a  lack  of  orthogonality  among  the  Lanezos  vectors.  More 
specifically,  significant  components  of  y,-,  the  converged  Ritz  vector,  creep 
into  subsequent  yj’s,  causing  spurious  copies  of  Ritz  pairs  to  be  generated  by 
the  Lanezes  process. 

Several  remedies  for  the  loss  of  orthogonality  have  been  proposed.  Paige 
suggests  full  reorthogonalization,  in  which  the  current  Lanezos  vector,  qj,  is 
orthogonalized  against  all  previous  Lanezos  vectors  [PaiTlj.  Full  reorthogo¬ 
nalization  becomes  increasingly  expensive  as  j  grows.  Cullum  and  Willoughby 
advocate  a  method  in  which  the  lack  of  orthogonality  is  ignored  and  sophis¬ 
ticated  heuristics  are  used  to  detect  the  eigenvalues  that  are  being  sought 
among  the  many  spurious  eigenvalues  that  are  generated  [CW85].  Simon 
proposes  a  scheme  called  partial  reorthogonalization  in  which  estimates  for 
the  orthogonality  of  the  current  Lanezos  vector,  yj,  against  all  previous  Lane¬ 
zos  vectors  are  inexpensively  computed.  Based  on  the  estimates,  a  small 
set  of  the  previous  Lanezos  vectors  are  orthogonalized  against  qj  [Sim84]. 
Partial  reorthogonalization  maintains  semi-orthogonality  among  the  Lane¬ 
zos  vectors.  For  all  q  vectors,  semi-orthogonality  is  defined  by: 

q^qi<(^^^  ii^j-  (26) 

“Semi-orthogonality”  among  the  q  vectors  guarantees  that  the  T  matrix  gen¬ 
erated  by  the  Lanezos  algorithm  executed  with  roundoff  errors  will  be  the 
same  up  to  working  precision  as  the  T  matrix  generated  using  exact  arith¬ 
metic  [PN085].  Selective  orthogonalization  is  used  by  LANZ  in  an  adapted 
form  to  maintain  semi-orthogonality  among  the  Lanezos  vectors  [PS79].  The 
strategy  of  selective  reorthogonalization,  as  proposed  by  Parlett  and  Scott, 
orthogonalizes  the  current  residual  vector,  r^,  and  the  last  Lanezos  vector, 
yj_i,  at  the  beginning  of  each  step  in  the  Lanezos  algorithm  against  “good” 
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Ritz  vectors  (more  details  of  how  this  occurs  will  be  given  in  Section  5). 
“Good”  Ritz  vectors  are  those  which  correspond  to  Ritz  values  for  which 
the  value  of  /?j,-  is  below  1|  >4  j|.  A  low  value  suggests  that  the  Ritz 
value  is  converging  and,  therefore,  from  Equation  25,  [  yfqj+i  \  is  increasing. 
The  value  of  ||  A  ||  is  used  to  ensure  that  the  quantity  |  yjqj+i  |  never 

rises  above  As  a  result,  semi-orthogonality,  as  defined  in  Equation  26, 

is  maintained. 
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5  The  LANZ  Algorithm 

5.1  The  Spectral  Transformation 

In  order  to  use  Lanczos’s  method  to  find  the  lowest  eigenvalues  or  the  eigen¬ 
values  closest  to  some  value,  a,  of  Kx  =  XMx,  a  transformation  of  the 
problem  must  be  made.  The  Lanczos  algorithm  described  in  Section  4  is 
applicable  only  to  Ax  =  \x.  Two  transformations  are  available  when  K  and 
M  are  symmetric  and  M  is  positive  semi-definite.  Each  transformation  in 
this  section  will  be  represented  by  a  capital  letter  that  has  no  other  mean¬ 
ing.  Transformation  A,  proposed  by  Ericsson  and  Rube,  replaces  A  with 
W'^{K  -  aM)-'^W  to  yield, 

W^(K  -  Wy  =  vy,  (27) 

where  M  =  WW^ ,  y  =  W^x,  and  A  =  <7  -|-  l/v  [ER80].  Transformation  B, 
suggested  by  Nour-Omid,  Parlett,  Ericsson  and  Jensen,  replaces  A  with  {K  — 
aM)~^M  and  uses  the  M-inner  product,  because  the  operator  is  no  longer 
symmetric  [NOPEJ87].  Note  that  M  does  not  form  a  true  inner  product 
because  M  is  not  positive  definite.  This  semi-inner  product  is  acceptable, 
however,  because  the  only  situation  in  this  algorithm  in  which  x'^Mx  =  0, 
for  a  non-trivial  x,  is  when  /^j+i  =  o>  which  indicates  exact  convergence  in 
the  Lanczos  algorithm.  (Hereafter,  the  semi-inner  product  will  be  referred 
to  as  just  an  inner  product).  Equation  1  becomes 

{K  —  <tM)~^Mx  =  i/x,  (28) 

where  the  eigenvalues  of  the  original  system  can  be  recovered  via 

X  =  <T  +  l/iz.  (29) 

Transformation  B  is  a  shifted  inverted  version  of  Equation  1  by  virtue  of  the 
following  steps.  Substituting  Equation  29  in  Equation  1  yields 

Kx  —  aMx  =  IjvMx.  (30) 

Then,  solving  for  x  and  multiplying  by  v  gives 

vx  =  {K  —  aM)~^  Mx.  (31) 
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1)  Choose  an  initial  vector,  guess 

2)  To  =  {K  —  oM)~^M guess  (Purge  n(B)  from  guess) 

3)  Pi  =  Mro 

4)  =  {rlpiy^ 

5)  For  i  =  1,  maximum  number  of  iterations 

6)  Reorthogonalization  phcise 

7)  =  Tj-iM 

8)  Pi  =  Pjl^j 

9)  (/<'  —  aM)rj  =  pj  (symmetric  indefinite  solve) 

10)  rj  =  Tj  - 

11)  aj  =  rjpj 

12)  r_,-  =  r_,  -  qjOj 

13)  Pi+i  =  Mrj 

14)  /?j+i  =  (rjpj+i)'/^ 

15)  Compute  the  eigenvalues  of  Tj  and  the  corresponding  error  bounds 

16)  Compute  any  converged  Ritz  vector 

17)  Halt  if  enough  eigenvalues  have  been  found 

18)  End  of  Loop 


Figure  2:  The  Lanczos  Algorithm  Using  Transformation  B 

Transformation  B  is  superior  to  transformation  A  in  both  storage  and 
operation  count  requirements  [NOPEJ87].  Transformation  B  requires  some 
modifications  to  the  original  algorithm,  including  the  solution  of  the  system 
{K  —  aM)x  =  y  for  z,  and  the  use  of  the  M-inner  product.  The  vector  p  has 
also  been  added  to  the  original  Lanczos  algorithm  to  hold  the  intermediate 
quantity  Mrj.  In  the  initialization  step,  the  initial  guess  vector  is  multiplied 
by  B  to  ensure  that  tq  is  contained  in  r(B),  where  B  =  [K  —  crM)~^M.  The 
efficient  implementation  of  these  operations  is  described  in  later  sections.  The 
modified  algorithm  is  shown  in  Figure  2  [NOPEJ87].  Reorthogonalization  in 
step  6  is  much  the  same  as  that  described  in  Section  4,  with  the  exception 
that  the  reorthogonalization  is  done  in  the  M-inner  product  [NOPEJ87]. 

If  the  matrix  M  is  singular,  then  n{B)  might  not  be  null.  The  eigen¬ 
vectors  of  Equation  1  have  no  components  in  n{M)  (also,  n(M)  =  n{B)) 
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[NOPEJ87].  In  exact  arithmetic,  if  the  starting  vector,  of  the  Lanczos 
process  is  restricted  to  r{B),  then  all  subsequent  Lanczos  vectors  will  be  re¬ 
stricted  to  r(B),  because  they  are  computed  by  multiplying  by  B.  However, 
in  finite  precision  arithmetic,  roundoff  error  allows  components  of  subsequent 
Lanczos  vectors  to  be  in  n(B);  therefore,  the  Ritz  vectors  calculated  from 
them  will  have  components  in  n{B)  [NOPEJ87].  Purifying  these  Lanczos 
vectors  is  an  expensive  process,  but  a  method  exists  that  instead  will  in¬ 
expensively  purify  the  calculated  Ritz  vectors.  The  vector  u;,-  is  computed, 
where  Wi  is  a  j  -f  1-length  vector  whose  first  j  components  are  calculated  as, 


Wi  =  il/9i)(TjSi), 

(32) 

and  whose  last  component  is 

{si{M+j)/ei. 

(33) 

Equation  21  then  becomes: 

Vi  =  Qi+iWi, 

(34) 

where  Wi  has  replaced  s,-. 

5.2  Transformations  for  Buckling 

Transformations  A  and  B  are  not  applicable  to  the  buckling  problem  because 
Kg  can  be  indefinite.  Transformation  A  fails  because  it  requires  the  Choleski 
factorization  of  Kq-  Transformation  B  fails  because  it  requires  the  use  of  a 
A’c-inner  product  which  would  be  an  indefinite  inner  product  and  introduce 
complex  values  into  the  calculations.  Transformation  C,  suggested  for  the 
buckling  problem  in  [Ste89b],  uses  the  operator  {K  -f  (jKcY^K.  The  trans¬ 
formation  {K  —  ctKg)  is  actually  suggested  in  [Ste89b],  but  {K  -f  cKq)  is 
preferred  because  it  yields  the  correct  inertia  for  the  buckling  problem  (note 
that  the  inertia  referred  to  here  is  not  the  inertia  of  a  physical  body,  it  is 
the  definition  of  inertia  used  in  Sylvester’s  inertia  theorem  relating  to  the 
number  of  positive,  negative,  and  zero  eigenvalues  of  a  matrix).  The  inertia 
of  {K  -|-  (tKg)  reveals  how  many  eigenvalues  of  Equation  3  are  less  than  a. 
Transformation  C  can  be  derived  from  Equation  3  by  substituting  ai//{t/  —  l) 
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for  A  in  Equation  3,  multiplying  each  side  by  {u  —  1),  rearranging  terms,  and 
finally,  multiplying  each  side  by  {K  +  crKG)~^  to  yield 

ux  =  {K  T  (jKgY^ Kx.  (35) 

To  recover  A,  use  Equation  36. 

A  =  <Tt//(z/  —  1).  (36) 

Transformation  C  requires  that  a  be  non-zero.  The  factorization  of  {K  -f 
(tKg)  and  the  use  of  the  /{’-inner  product  are  necessary  for  transformation 
C. 

In  exact  arithmetic,  the  convergence  rate  of  the  Lanczos  algorithm  is  the 
same  for  transformations  B  and  C  when  <t  is  fixed  (B  performs  the  same 
transformation  on  the  spectrum  as  A).  The  Kcmiel-Page-Saad  theory  is  used 
to  explain  the  convergence  of  eigenvalues  when  using  the  Lcinczos  algorithm 
and  is  now  introduced  to  allow  a  comparison  of  the  transformations  and 
to  show  the  effects  of  moving  a  on  the  convergence  rate  [SaaSO].  Three 
definitions  that  will  be  useful  in  this  explanation  are; 

Ki  =  n  (^m  -  *^.n/)/(»'m  “  (37) 

m5=l 

7i  =  1  -f  2(i/i  -  i/.+i)/(i/,+i  -  !/,„/),  (38) 

and  the  Chebyshev  polynomial, 

C7„(x)  =  l/2((x  (x^  -  +  (x  -  (x^  -  l)’/2)”).  (39) 

The  bound  on  the  difference  between  an  eigenvalue  of  the  Tj  matrix,  0,-,  and 
an  eigenvalue  of  the  transformed  system,  i/,,  at  the  jth  step  of  the  Lanczos 
algorithm  is 

Q<Ui-e]<{ui-  Vinj){Klian  u;(xi,ro)/Cj_i(7i))^,  i/,  >  r/.+j.  (40) 

The  tan  a;(x,-,ro)  is  determined  by  the  angle  between  the  eigenvector,  x,-, 
associated  with  i/i  and  the  starting  Lanczos  vector,  rp.  Because  the  angle  be¬ 
tween  X,  and  To  does  not  change  during  the  Lanczos  algorithm  and  because 
Kf  does  not  vary  greatly,  the  term  that  governs  the  rate  of  the  convergence 
is  Cj-i{‘Ki).  As  j  increases,  Cj-i('fi)  grows  more  quickly  for  large  (pi  than  for 


16 


Eigenvalue 

order 

f  1  IT*I 

a  ~  25.9 

lllHi 

<t>i 

4^1 

26 

■Oil 

0.2199 

0.3214 

4.2857 

42.3442 

30 

mi 

hhb 

- 

- 

- 

100 

IBB9 

i 

- 

- 

- 

Figure  3:  Effects  of  transformation  on  eigenvalue  separation 

small  (f)i  {(f)i  is  a  term,  |  (i',  — i^,+i)/(t'i+i  —  I'm/)  I,  obtained  from  the  definition 
of  7).  The  4>i  reflect  the  separation  of  individual  eigenvalues  from  their  neigh¬ 
bors  relative  to  the  remaining  width  of  the  spectrum.  Transformations  are 
used  to  increase  (f>i  for  the  desired  eigenvalues  by  transforming  the  spectrum 
such  that  the  desired  eigenvalues  are  well-separated  from  other  eigenvalues. 
It  can  be  shown  that,  if  used  with  the  same  a,  transformations  B  and  C  have 
the  same  effect  on  the  However,  moving  o  closer  to  a  desired  eigenvalue, 
A,-,  increases  the  corresponding  4>i  (and  therefore  speeds  convergence  of  By  to 
A,-).  The  increase  in  as  <7  is  moved  closer  to  Aj  is  shown  in  Figure  3.  Thus, 
as  cr  is  moved  closer  to  Ai  the  convergence  of  to  Ai  is  speeded  up. 

Transformations  B  and  C  have  the  same  effect  on  the  convergence  rates 
of  the  Lanczos  process  and  C  can  be  used  in  both  the  buckling  and  vibration 
problems,  so  the  question  arises  “Why  not  use  transformation  C  for  both  the 
buckling  and  vibration  problems?”  Although  B  and  C  have  the  same  effect 
in  exact  arithmetic,  they  each  yield  different  i/’s  for  the  same  cr.  In  finite 
precision  arithmetic,  transformation  C  is  inferior  to  transformation  B  when 
a  is  small  relative  to  the  desired  A’s.  Although  each  transformation  requires 
the  solution  of  a  linear  system  and  the  multiplication  of  a  matrix  by  a  vector, 
the  distribution  of  the  i/’s  for  small  o  in  transformation  C  leads  to  large 
errors  in  the  computation  of  the  A’s,  For  small  cr,  the  u's  of  transformation 
C  become  close  to  1  while  the  same  effect  is  not  seen  when  transformation 
B  is  used  (note  that  the  in  each  case  are  identical).  The  j/’s  that  result 
when  using  transformation  C  become  increasingly  close  to  1  as  cr  is  moved 
from  1.0  to  0.01,  whereas  the  u's  that  result  when  using  transformation  B 
show  little  change  (this  trend  is  shown  in  Figure  4).  Because  the  Lanczos 
algorithm  consists  of  the  same  calculations  for  each  transformation,  in  finite 
precision  arithmetic  the  algorithm  computes  perturbed  values  assumed  to  be 
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Figure  4;  Effects  of  transformation  on  eigenvalues 

of  the  form,  (1  +  c)i/,  instead  of  an  exact  v.  The  effect  of  this  perturbation  on 
the  computed  A’s  is  the  difference  between  the  two  transformations.  Recall 
that  in  transformation  B,  A  =  a  +  1/i/b,  and  that  in  transformation  C, 
A  =  <T  +  cr/(i/c  —  1)  (the  subscript  on  v  is  introduced  because  the  i/’s  are 
different  in  each  transformation  and  these  values  are  being  compared).  If  uc 
is  solved  for  in  terms  of  1/5,  then 

uc  =  auB  +  1.  (41) 

Let  S\b  and  6 Ac  denote  the  difference  between  the  true  A  and  the  A  computed 
using  transformations  B  and  C,  respectively.  These  fA’c  ari.,  expressed  in 


terms  of  perturbed  i/’s  in  the  following  equations: 

A  +  8Xb  =  (X  +  1/(1  +  (42) 

A  +  8\c  =  <7  -}-  o’/((l  -f  t)t'c  ~  !)■  (‘13) 

If  Equation  41  is  substituted  into  Equation  43,  then 

\ 8\c  —  <T -¥  a l(ouB tervB f).  (44) 

If  the  true  A  is  subtracted  from  each  side  of  Equations  42  and  44,  then 

^Ab  =  l/(i/B  +  ci/b)  -  1/j/B-  (45) 

^Ac  =  l/(j/B  +  ct'B  +  e/cr)  -  l/i/B-  (46) 


Thus,  the  error  in  the  computed  A  for  transformation  C  increases  sharply 
as  a  decreases.  To  show  the  increase  in  error  for  transformation  C,  the 
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errors  in  the  two  transformations  (from  Equations  45  and  46)  are  plotted  in 
Figure  5  for  A  =  10  and  e  =  0.0001.  From  this  derivation  and  the  graph  of 
the  functions,  it  is  clear  that  transformation  C  should  be  avoided  when  a  is 
small  compared  to  the  desired  A. 

From  the  previous  discussion  the  conclusion  can  be  drawn  that  trans¬ 
formation  B  is  preferred  to  C  whenever  possible.  However,  as  was  pointed 
out  previously,  transformation  B  is  not  applicable  to  the  buckling  probl'*m. 
Therefore,  a  new  transformation,  D,  which  transforms  the  eigenvalues  in  the 
same  fashion  as  transformation  B  (when  tr  =  0)  is  introduced.  Transforma¬ 
tion  D  can  be  used  with  an  indefinite  Kg  matrix  but  can  only  be  used  when 
a  is  0.  Transformation  D  is  derived  from  Equation  3  in  the  following  steps 
[CW85];  first,  substitute  Xjv  for  A  and  then  multiply  each  side  by  K~^v  to 
yield 

ux  =  K-^Kox;  (47) 

next,  expand  the  implicit  identity  matrix  in  each  side  as  /  =  C~^C^,  where 
K  =  CC^,  let  y  =  C^x,  and,  finally,  multiply  each  side  by  to  yUd 

vy  =  C-'^KcC-^y.  (48) 

The  operator  for  transformation  D  is  C~^ Kg(^~^ •  This  transformation  re¬ 
quires  the  Choleski  factorization  of  K  and  uses  the  standard  inner  product. 
The  eigenvectors,  a:,  must  be  recovered  via  the  solution  of  a  triangular  linear 
system,  using  the  foregoing  equation  for  computing  y.  When  an  initial  non¬ 
zero  guess  for  cr  exists,  the  method  used  in  LANZ  for  solving  the  buckling 
problem  uses  transformation  C  exclusively;  when  an  initial  guess  for  a  isn’t 
available,  the  method  used  begins  by  using  transformation  D  with  a  at  0, 
and  then  switches  to  transformation  C  when  a  shift  of  cr  is  needed  (the  use 
of  shifting  will  be  described  in  the  next  subsection).  Thus,  the  use  of  trans¬ 
formation  C  with  small  o  is  avoided,  and,  yet,  the  advantage  resulting  from 
shifting  is  maintained. 

5.3  The  Use  of  Shifts 

An  efficient  algorithm  for  computing  several  eigenvalues  requires  that  the 
shift,  (7,  be  moved  as  close  as  possible  to  the  eigenvalues  that  are  being  com¬ 
puted.  The  closer  that  a  is  to  an  eigenvalue  being  computed,  the  faster  the 
convergence  to  that  eigenvalue.  Ericcson  and  Ruhe  describe  a  method  for 
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selecting  shifts  and  deciding  how  many  Lanczos  steps  to  take  at  each  shift 
[ER80].  The  efficiency  of  the  algorithm  depends  on  how  well  these  shifts  are 
chosen,  and  how  many  Lanczos  steps  are  taken  at  each  shift.  Normally,  the 
most  expensive  step  in  the  Lanczos  process  is  the  factorization  of  (K  —  aM), 
which  must  be  done  once  for  each  shift.  But  if  j  becomes  large,  the  calcu¬ 
lation  of  an  eigenvector.  Equation  21,  can  be  very  expensive.  In  addition, 
many  steps  could  be  required  to  converge  to  an  eigenvalue  if  the  shift  is  far 
away  from  this  eigenvalue,  or  if  the  eigenvalue  is  poorly  separated  from  its 
neighbors.  The  method  used  by  Ericcson  and  Ruhe  first  estimates  that  r 
eigenvalues  will  converge  in  j  steps,  where  r  is  calculated  based  on  the  fact 
that  the  eigenvalues  of  Equation  1  are  linearly  distributed.  A  cost  analysis  of 
the  algorithm  is  performed,  and  from  this  analysis,  a  determination  of  how 
many  steps  to  take  at  a  shift  is  made.  Their  choice  of  shift  depends  on  the 
inertia  Ccdculation  in  the  factorization  step  [ER80]. 

In  the  problems  from  the  NASA  structures  testbed  [Ste89a],  the  distribu¬ 
tion  of  the  eigenvalues  is  often  anything  but  linear.  This  distribution  makes 
the  above  estimates  invalid  and  requires  a  different  method  for  deciding  how 
many  steps  to  take  at  each  shift.  Instead  of  calculating  the  number  of  steps 
to  take  for  a  shift  prior  to  execution,  LANZ  uses  a  dynamic  criterion  to 
decide  when  to  stop  working  on  a  shift.  Later,  in  Section  6,  a  cost  analy¬ 
sis  of  the  Lanczos  algorithm  shown  in  Figure  2  is  given.  This  cost  analysis 
is  part  of  the  basis  for  the  dynamic  shifting  algorithm.  The  implementa¬ 
tion  of  LANZ,  however,  uses  execution  timings,  rather  than  a  precalculated 
cost  analysis,  because  the  cost  analysis  is  different  for  each  architecture  on 
which  the  implementation  is  run.  These  timings  let  LANZ  know  how  long 
each  Lanczos  step  takes  and  the  cost  of  factorization  at  a  new  shift.  In 
addition  to  the  timing  information,  the  estimated  number  of  steps  required 
for  unconverged  eigenvalues  to  converge  is  calculated.  The  step  estimate 
is  computed  by  tracking  the  eigenvalues  of  Tj  (and  the  corresponding  error 
bounds)  throughout  the  execution  of  the  Lanczos  algorithm.  The  method  for 
this  tracking  and  computation  of  eigenvalues  is  described  in  Section  7.  With 
estimates  for  the  number  of  steps  required  for  eigenvalues  to  converge  and 
the  time  needed  for  a  step  (or  new  factorization)  to  execute,  the  decision  to 
keep  working  on  a  shift  or  choose  a  new  shift  can  be  made  efficiently. 

The  selection  of  a  new  shift  depends  on  the  information  generated  dur¬ 
ing  the  execution  of  LANZ  on  previous  shifts  and  on  inertia  calculations  at 
previous  shifts.  The  inertia  calculations  are  used  to  identify  any  eigenvalues 
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that  were  skipped  over  in  previous  steps,  including  eigenvalues  of  multiplicity 
greater  than  one.  The  estimated  eigenvalues  and  their  error  bounds  gener¬ 
ated  during  the  execution  of  Lanczos  enable  the  selection  of  a  new  shift  based 
on  these  estimates  (if  no  eigenvalues  were  skipped  in  the  previous  run).  Be¬ 
cause  convergence  to  an  eigenvalue  is  faster  if  the  shift,  a,  is  chosen  to  be 
close  to  that  eigenvalue,  LANZ  seeks  to  choose  a  shift  that  is  near  to  the 
desired  unconverged  eigenvalue.  However,  a  must  not  be  chosen  so  close  to 
an  eigenvalue  that  the  system  (K  —  crM)x  =  y  becomes  very  ill-conditioned. 
In  the  authors’  experience,  Lanczos’s  method  generates  approximations  to 
all  nearby  eigenvalues,  so  that  even  if  an  eigenvalue  is  not  converged  to,  an 
estimate  along  with  an  error  bound  for  a  nearby  eigenvalue  is  generated.  If 
the  initial  Lanczos  vector  is  not  deficient  in  the  eigenv'ector  corresponding 
to  an  eigenvalue,  and  if  that  eigenvalue  is  close  to  cr,  then  the  Kaniel-Page- 
Saad  theory  shows  that  Lanczos’s  method  will  generate  an  estimate  to  that 
eigenvalue  in  a  few  steps  [Kan66].  In  practice,  even  if  the  initial  Lanczos 
vector  is  deficient  in  the  eigenvector,  round-off  error  will  quickly  make  that 
eigenvector  a  component  of  the  Lanczos  vectors  [ER80]. 

When  a  new  shift  is  chosen,  the  initial  Lanczos  vector  is  chosen  to  be 
a  weighted  linear  combination  of  the  Ritz  vectors  corresponding  to  the  un¬ 
converged  Ritz  values  of  the  previous  shift,  w’here  the  weights  are  chosen  as 
the  inverse  of  the  error  bounds  of  those  Ritz  values  [PS79].  The  number  of 
Ritz  vectors  chosen  is  based  on  the  number  of  eigenvectors  still  to  be  found 
and  the  number  of  Ritz  vectors  with  “reasonable”  error  bounds.  To  exam¬ 
ine  the  effect  of  using  a  linear  combination  of  Ritz  vectors  rather  than  a 
random  vector  ais  the  initial  vector,  several  structural  engineering  problems 
(both  buckling  and  vibration)  were  solved  using  both  methods  for  selecting 
an  initial  vector  (for  an  explanation  of  the  problems  used,  see  Section  9).  The 
number  of  steps  taken  by  each  method  to  get  the  same  number  of  eigenvalues 
is  given  in  Figure  6  and  from  this  it  appears  that  using  the  linear  combination 
of  Ritz  vectors  is  always  as  good  or  better  than  choosing  a  random  vector. 

To  give  the  reader  a  clearer  picture  of  the  overall  execution  flow  of  LANZ, 
a  flow  chart  is  shown  in  Figure  7. 

5.4  Selective  Orthogonalization 

The  method  used  to  maintain  “semi-orthogonality”  among  the  Lanczos  vec¬ 
tors  is  a  modification  of  selective  orthogonalization  as  proposed  by  Parlctt 
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Problem 

Size 

Random 

Linear  Combination 

Mast 

(buckling) 

n  =  1980 
/3  =  58 

19  steps 

19  steps 

Mast 

(vibration) 
(diagonal  M) 

n  =  1980 

0  =  5S 

10  steps 

10  steps 

Mast 

(vibration) 
(non-diagonal  M) 

n  =  1980 
/S  =  58 

11  steps 

8  steps 

Cylinder 

(buckliug) 

n  =  7644 
p  =  385 

3  steps 

2  steps 

Figure  6:  Methods  for  choosing  initial  vector 

and  Scott  [PS79].  As  described  in  Section  4,  the  Lanczos  vectors  do  not 
lose  orthogonality  until  a  Ritz  pair  converges  to  an  eigenpair  of  the  system. 
At  this  point,  significant  components  of  the  Ritz  vector  that  have  converged 
begin  to  creep  into  subsequent  Lanczos  vectors.  Selective  orthogonalization 
monitors  the  level  of  orthogonality  between  converged  Ritz  vectors  and  Lanc¬ 
zos  vectors.  If  at  step  j,  the  orthogonality  level  between  a  converged  Ritz 
vector  and  the  Lanczos  vector  qj  is  above  a  given  threshold  (Parlett  and  Scott 
suggest  that  be  used),  the  Ritz  vector  is  purged  from  both  qj  and  qj-i. 

Parlett  and  Scott  use  the  following  derivation  to  monitor  the  level  of 
orthogonality  [PS79j.  In  finite  precision  arithmetic  the  Lanczos  recurrence 
relation  is 

=  Bqj  -  Qjqj  -  Pjqj-i  -b  /,  (49) 

where  B  is  one  of  the  operators  described  in  Subsection  5.1  and  /,  represents 
the  roundoff  error.  The  bound  on  ||  /,  ||  is  ce  ||  B  ||  where  c  is  a  constant 
independent  of  j  derived  from  B.  If  each  side  of  Equation  49  is  multiplied 
by  ^  where  y  is  a  Ritz  vector  computed  from  Equation  21,  then 

=  y^ Bqj  -  y'^Ojqj  -  -t-  y^/,.  (50) 

Multiply  each  side  of  Equation  23  by  s  to  yield 

By  =  0y-^r  (51) 
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Start  with  the  user’s  initial  shift 
or  use  0  if  none  is  specified 


Figure  7:  Execution  flow  of  LANZ 


24 


where  r  is  not  rj  and  will  be  discussed  below.  If  the  bound  for  ||  /,•  ||  and 
Equation  51  are  substituted  into  Equation  50,  Equation  52  results. 

^i+iy^9j+i  =  +  r^  -  oijy'^)qj  -  +  cc  ||  B  ||  (52) 

Parlett  and  Scott  assume  that  r  =  qk+i^ki  for  some  k  <  j,  and  therefore, 
that 

1  r'^qj  1<  ^ki  1  1  .  (53) 

They  then  state:  1)  j  qJ^^qj  |<  because  semi-orthogonality  among  the 
Lanczos  vectors  is  maintained,  and  2)  if  y  is  a  converged  Ritz  vector,  then 
/3ji  is  less  than  or  equal  to  ||  B  |1.  Fact  2  is  caused  by  the  definition 
of  a  “good”  Lanczos  vector  given  in  Section  4.  If  facts  1  and  2,  along  with 
Tj  =(  y^qj  I,  are  substituted  into  Equation  52,  then  Equation  54  Is  derived. 

Tj+i  <i\6-oj\  Tj  +  fijTj.i  +  c  !|  B  11  +ce  II  B  ||)//?j+i  (54) 

Because  c  and  jj  B  1|  are  small  and  not  readily  available,  Parlett  and  Scott 
ignore  these  terms  and  derive  the  following  recurrence  relation  for  Tj+ii 

Tj+l  <  (I  5  -  Oj  1  Tj  -f  ^yr,_j)//3j+i.  (55) 

The  values  tq  and  Ti  are  initialized  to  e  and  whenever  y  is  purged  from  qj, 
Tj  is  reset  to  c.  From  this  recurrence  relation,  the  conclusion  can  be  drawn 
that  the  Lanczos  vectors  should  be  orthogonalized  in  pairs. 

This  recurrence  relation  predicts  the  actual  level  of  orthogonality  very  well 
in  practice  with  two  exceptions.  The  first  problem  occurs  when  calculating 
Tj+i  after  y  hcis  been  computed  at  step  j  —  1  and  purged  from  and  qj.  In 
this  situation,  a  small  increase  in  Tj+i  over  Tj  and  Tj-i  is  expected.  However, 
a  large  increeise  occurs.  This  increaise  is  caused  by  the  assumption  on  Parlett 
and  Scott’s  part  that  ]  qJ^-^qj  1<  when  in  fact  that  equation  only  holds 
when  k  <  j  —  The  quantity,  |  qj^^qj  1,  is  1  when  k  =  j  —  Thus, 
Equation  55  holds  when  k  <  j  —  but  when  A:  =  j  —  1  Equation  55  becomes: 

Tj+i  <  (I  ^  -  aj  1  Tj  +  PjTj^i  -I-  ^y_i,,)/;3j+i.  (56) 

The  second  problem  arises  when  using  Equation  34  to  compute  y.  In  this 
case  r  =  Oy  ^j^i^qj  -f-  /3j^ijBqj/6  assuming  y  is  computed  at  step  j  —  1; 
therefore, 

1  1<  Stj  +  4  (57) 
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The  recurrence  relation  used  for  then  becomes: 

Tj+i  <  (I  ^  -  Qfj  I  Tj  +  H-  (58) 

If  y  has  been  computed  at  step  j  —  2,  then 

I  l<  GTj  +  (59) 

and,  because  the  second  term  is  very  small,  the  recurrence  relation  for  tj+i 
becomes 

Tj+i  <{\e-aj\  Tj  +  +  /9j_2,./?j7^)/;5j+i.  (60) 

5.5  Orthogonalization  Methods 

Selective  orthogonalization  and  partial  reorthogonalization  are  the  two  best 
known  orthogonalization  methods  other  than  full  reorthogonalization.  Par¬ 
tial  reorthogonalization  monitors  the  orthogonality  of  qj  and  q^+i  versus  the 
other  Lanczos  vectors.  Partial  orthogonalization  measures  the  orthogonality 
between  qj  and  qk,  u>jki  via  the  recurrence  relation  defined  in  the  following 
set  of  equations  [Sim84]: 

Wfcfc  =  1  for  k  =  (61) 

<^kk-i  =  for  k  =  2,...,j,  (62) 

0j+i^j+ik  =  ^j+iujjk+i  +  (a*  —  Qj)u;jk  -1-  Pk^jk-y  —  /3jU)jk-i  +  qj fk  —  9k  fj  (63) 
and  Wjfc+i  =  u}k+ij  for  1  <  k  <  j.  (64) 

Simon  has  stated  that  the  theoretical  relationship  between  partial  reorthogo¬ 
nalization  and  selective  orthogonalization  is  not  known  [Sim84].  The  follow¬ 
ing  discussion  explains  the  relationship  between  the  two  methods.  If  yf  on 
the  left  side  of  Equation  50  is  expanded  to  sJQJ,  and  the  resulting  equation 
is  divided  by  the  right  side  becomes  Tj+jj,  yielding 

Qj  9j+i  —  '^j+i,i  (65) 

From  the  recurrence  relation  for  partial  reorthogonalization,  the  product 
Qjqj+i  is  the  vector  where  k  runs  from  1  to  j.  Thus  the  relation¬ 

ship  between  the  u's  and  the  r’s  is  governed  by 

^^^'j+i,k  =  T-j+i./t,  where  k  =  (66) 
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This  relationship  in  Equation  66  has  also  been  observed  in  numerical  exper¬ 
iments  run  by  the  authors.  When  a  Ritz  vector,  j/,,  is  purged  from  and 
therefore  Tj+i,,-  becomes  small,  the  for  which  the  values  of  are  the 

largest  decrease  significantly. 
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6  Execution- Time  Cost  Analysis 

An  analysis  of  the  execution-time  cost  of  the  Lanczos  algorithm  when  using 
transformation  B  is  given  in  Appendix  A.  Because  the  costs  for  the  other 
transformations  are  almost  identical  their  cost  will  not  be  analyzed  in  this 
section.  The  analysis  has  two  purposes:  1)  to  allow  the  computation  of  the 
tradeoff  point  between  re-starting  the  Lanczos  algorithm  at  a  different  shift 
and  continuing  with  the  current  shift,  and,  2)  to  allow  analysis  of  performance 
on  parallel  and  vector/parallel  machines.  Throughout  the  analysis,  only  the 
cost  of  floating  point  operations  is  included.  The  assumption  is  made  for 
this  analysis  that  the  matrices  have  a  band  structure  and  that  the  Bunch- 
Kaufman  method  (or  an  LDL^  decomposition)  is  used  for  the  solution  of 
the  linear  systems.  In  order  to  simplify  the  analysis,  the  assumption  is  made 
that  the  bandwidth  of  K  is  greater  than  or  equal  to  the  bandwidth  of  M.  This 
assumption  has  no  effect  on  the  analysis  other  them  to  avoid  making  some  of 
the  operation  costs  conditional  on  which  matrix  has  the  larger  bandwidth. 
Much  of  this  analysis  does  not  take  into  account  “end  conditions,”  such  as 
those  that  arise  near  the  end  of  a  factorization  when  less  work  needs  to  be 
done  than  in  the  middle  of  a  factorization.  Thus,  some  of  the  expressions  are 
necessarily  approximations. 

Several  observations  regarding  the  shift  tradeoff  c<m  be  made  from  the 
cost  analysis;  1)  the  single  most  expensive  step  in  the  algorithm  is  the  fac¬ 
torization  phase  (2B)  which  is  0(n/i^),  2)  the  cost  of  the  reorthogonalization 
phase  increases  as  j  increases  because  of  the  increeising  number  of  “good” 
Ritz  vectors  to  orthogonalize  against,  3)  the  cost  of  computing  a  converged 
Ritz  vector  is  based  on  p  and  therefore  increases  rapidly  as  j  increases,  and 
4)  the  cost  of  the  rest  of  the  operations  in  the  program  loop  is  not  affected  by 
growth  in  j  (with  exception  of  step  15  but  this  step  is  not  costly  enough  to 
consider).  To  illustrate  how  the  costs  of  the  four  operation  groups  per  Lanc¬ 
zos  step  change,  the  number  of  floating  point  operations  per  step  is  plotted 
against  j,  the  number  of  Lanczos  steps,  in  Figure  8.  The  costs  in  the  Fig¬ 
ure  8  are  from  an  actual  LANZ  run  during  which  a  new  shift  was  selected 
beginning  at  step  22.  These  costs,  of  course,  will  differ  for  each  problem. 
From  the  cost  analysis  and  this  graph  it  can  be  seen  that  a  tradeoff  exists 
between  the  benefits  of  a  taking  a  new  shift  (smaller  reorthogonalization  and 
eigenvector  computation  cost  as  well  as  accelerated  convergence  to  desired 
eigenvalues)  and  the  benefit  of  continuing  work  on  the  current  shift  (avoiding 
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Figure  8:  Operation  costs  plotted  against  Lanczos  step  number 
the  cost  of  refactorization). 

The  LANZ  implementation  uses  actual  timings  of  the  various  steps  dur¬ 
ing  the  current  run  to  analyze  the  tradeoff,  rather  than  substituting  values 
for  the  cost  of  various  operations  for  the  machine  being  used.  The  use  of 
timings  is  simpler  to  implement  and  makes  the  code  more  portable. 

i 

} 
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7  Solution  of  the  system  Ts  =  6s 

The  size,  of  the  tridiagonal  system,  Ts  =  0s,  generated  by  the  Lanczos 
algorithm  is  1  at  step  1  and  increases  by  1  at  each  Lanczos  step.  The  size 
of  T  is  usually  very  small  compared  to  the  size  of  the  original  problem,  n. 
Therefore,  the  time  used  to  solve  the  tridiagonal  system  does  not  greatly 
affect  the  sequential  execution  time  of  the  LANZ  algorithm.  However,  if  the 
rest  of  the  algorithm  is  parallelized,  the  solution  of  the  tridiagonal  system 
could  well  become  a  large  factor  in  the  parallel  execution  time.  Parlett 
and  Nour-Omid  have  proposed  a  method  of  tracking  a  small  group  of  the 
eigenvalues  of  the  T  matrices  as  they  are  produced  by  the  Lanczos  algorithm. 
An  inexpensive  by-product  of  their  method  is  the  error  bounds  for  the  0j’s 
[PN085].  Their  algorithm  monitors  the  outermost  ^’s  whose  error  bounds, 
fiji,  indicate  that  they  will  converge  in  the  next  2  or  3  steps;  it  actually 
monitors  8  eigenvalues  at  a  time.  There  are  two  phaises:  1)  the  previous 
O's  and  their  error  bounds  are  updated  and  any  new  ^’s  are  detected,  and 
2)  converged  0's  are  detected  and  removed  from  the  data  structure.  This 
algorithm  is  not  suitable  for  use  by  LANZ  for  3  reasons:  1)  it  is  not  easily 
par^xilelizable,  2)  it  does  not  track  an  eigenvalue  for  many  steps  to  get  a 
convergence  rate  estimate,  and  3)  in  tests  run  by  the  authors,  it  often  failed. 

The  authors  have  developed  a  new  solution  method  that  is:  1)  inherently 
parallel,  2)  tracks  all  the  eigenvalues  of  T  from  step  to  step,  and  3)  has  been 
used  successfully  with  LANZ  to  solve  real  structures  problems.  The  method 
uses  information  from  step  j  —  1  to  solve  for  all  the  eigenvalues  and  their  error 
bounds  at  step  j.  It  uses  Cauchy’s  interlace  theorem,  shown  in  Equation  67, 
to  find  ranges  for  the  all  the  eigenvalues  (except  the  outermost  eigenvalues) 
of  Tj  from  the  eigenvalues  of  Tj-i- 

0i^^  <  0i  <  0i-^^  <  0]  <  0^jXl  (67) 

Cauchy’s  interlace  theorem  states  that  the  eigenvalues  of  Tj  interlace  those 
of  Tj+i  {Par80].  In  addition  to  the  interlace  property,  the  error  bounds,  /3ji, 
from  the  previous  step  can  be  used  to  provide  even  smaller  ranges  for  some 
eigenvalues.  If  good  error  bounds,  Pji,  are  not  available  for  the  outer  eigenval¬ 
ues  (the  interlace  property  only  gives  a  starting  point  for  these  eigenvalues), 
they  can  be  found  by  extending  an  interval  from  the  previous  extreme  eigen¬ 
value.  However,  a  property  of  the  Lanczos  algorithm  is  that  the  extreme 
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eigenvalues  are  usually  the  first  to  stabilize.  The  algorithm  for  the  method 
just  described  is  given  in  Figure  9.  For  simplicity,  the  algorithm  does  not 
show  the  code  for  handling  extreme  eigenvalues.  The  algorithm  requires  two 
subroutines,  the  root  finder  described  below  and  a  function,  numless,  that 
uses  spectrum  slicing  to  determine  the  number  of  eigenvalues  of  Tj  less  than 
a  value.  Details  of  how  to  efficiently  implement  these  subroutines  are  given 
by  Parlett  and  Nour-Omid  [PN085]. 

A  root  finding  method,  such  as  bisection  or  Newton’s  method,  can  be 
used  to  find  the  eigenvalues  in  the  ranges  given  by  the  algorithm  in  Figure  9 
[PN085].  Newton’s  method  is  preferred  for  its  fast  convergence  and  because 
it  generates  the  jth  element  in  s,-  as  a  by-product,  which  allows  for  the  inex¬ 
pensive  computation  of  the  error  bound  for  0,-  [PN085].  For  safety’s  sake,  the 
Newton  root  finder  is  protected  by  a  bisection  root  finder  to  ensure  that  New¬ 
ton’s  method  converges  to  the  desired  root.  If  the  Ritz  vector  corresponding 
to  a  particular  eigenvalue,  0,,  needs  to  be  computed,  inverse  iteration  can  be 
used  to  compute  s,-.  Because  the  calculation  of  every  eigenvalue  is  indepen¬ 
dent,  this  algorithm  is  inherently  parallel.  In  order  to  save  time,  it  may  be 
beneficial  to  keep  track  of  which  eigenvalues  of  T  have  stabilized,  as  these  do 
not  need  to  be  recomputed.  The  major  difficulty  in  parallelizing  this  algo¬ 
rithm  appears  to  be  load  balancing;  it  will  take  different  numbers  of  Newton 
iterations  to  find  each  eigenvalue,  and  only  occasionally  will  inverse  iteration 
be  necessary. 

The  algorithm  developed  by  the  authors  for  solving  the  tridiagonal  system 
also  tracks  the  eigenvalues  of  Tj  from  step  to  step.  This  tracking  is  necessary 
for  two  reasons.  First,  selective  orthogonalization  requires  the  computation  of 
the  Ritz  vectors  corresponding  to  the  eigenvalues  of  Tj  that  become  “good” 
(as  defined  in  Section  4)  at  step  j.  Those  Ritz  vectors  can  be  used  from 
step  j  -f  1  until  the  end  of  the  Lanczos  run  if  the  eigenvalues  in  T’i+i  (and 
subsequent  T^’s)  that  correspond  to  the  eigenvalues  in  Tj  can  be  identified. 
Second,  the  rate  of  convergence  of  a  particular  eigenvalue  is  predicted  by 
tracking  its  convergence  over  several  steps  (the  use  of  the  convergence  rate 
was  described  in  a  previous  section). 
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bounded[i]  =  0  ,  for  t  =  1,  j 
do  I  =  1,  j  —  1 

if  {{2* Pa  <  Oi  -  di-i)  and  {2*/3ji  <  -  5.))  then 

probe  =  6i  + 
less  =  numless(probe) 
if  (less  =  i)  then 
bounded  [t]  =  t 

else  /*  t  and  t  +  1  are  the  only  values  numless  will  return,  if 
it  returns  something  else,  a  grave  error  has  occurred  */ 
bounded[i  +  1]  =  t 
endif 
endif 
enddo 
do  i  =  1,  j 

if  (bounded[i]  =  0)  then 
leftbound  = 
rightbound  =  6i 

newtonroot  (leftbound  ,rightbound  ,new^i , new ) 
else  if  (bound [t]  =  i)  then 
leftbound  =  6i  —  0ji 
rightbound  =  di 

newtonroot(leftbound, rightbound, new0j, new /?j,) 
else  if  (bound[i]  =  t  —  1)  then 
leftbound  = 
rightbound  = 

newtonroot(leftbound,rightbound,new0,,new/?j,) 

endif 

enddo 


Figure  9:  Tridiagonal  Eigenvalue  Solver 
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8  The  Solution  of  the  system  {K  —  aM)x  =  y 

The  solution  to  the  possibly  indefinite  system, 

{K  —  aM)x  —  y,  (68) 

is  normally  the  most  time-consuming  step  in  the  Lanczos  algorithm  for  trans¬ 
formations  A,  B,  and  C  (unless  there  are  very  few  non-zero  elements  in 
(K  —  aM)).  Therefore,  it  makes  sense  to  try  to  optimize  this  step  of  the 
algorithm  as  much  as  possible.  Two  approaches  can  be  taken  to  solving 
this  system:  1)  the  use  of  direct  solution  methods,  or  2)  the  use  of  iterative 
solution  methods.  Because  the  problems  under  consideration  can  be  very 
ill-conditioned,  the  use  of  iterative  methods  has  been  avoided. 

Because  this  paper  is  focused  on  the  problem  in  which  K  and  M  are 
banded,  the  discussion  in  this  section  is  limited  to  the  banded  case.  In  the 
vibration  problem,  because  K  is  positive  definite  and  M  is  semi-positive 
definite,  if  <t  <  0,  then  the  system  in  Equation  68  is  positive  definite.  In  the 
buckling  problem,  because  K  is  positive  definite,  if  a  =  0,  then  only  K  must 
be  factored  because  transformation  D  is  used.  Because  K  and  M  are  always 
symmetric,  Choleski’s  method  can  be  used  to  solve  these  systems.  Choleski’s 
method  is  the  direct  method  of  choice  for  this  class  of  banded  linear  systems 
because  it  is  stable  and  results  in  no  destruction  of  the  bandwidth  [BKP76]. 
Choleski’s  method  is  used  by  LANZ  for  the  vibration  problem  whenever 
<7  <  0  and  in  the  buckling  problem  whenever  <7  =  0. 

The  system  in  Equation  68  can  be  indefinite  whenever  cr  >  0  in  the 
vibration  problem  and  may  be  indefinite  in  the  buckling  problem  when  a 
is  non-zero.  When  the  system  is  indefinite,  Choleski  factorization  will  fail 
because  a  square  root  of  a  negative  number  will  be  taken,  and  the  LDLF 
decomposition  is  not  stable  because  the  growth  of  elements  in  L  cannot  be 
bounded  a  priori  [Wil65].  The  methods  of  choice  for  factoring  a  full  symmet¬ 
ric  indefinite  matrix  are  the  Bunch-Kaufman  method  and  Aasen’s  method 
[BG76].  It  was  believed  that  both  methods,  however,  would  destroy  the 
structure  of  a  banded  system  and  not  be  competitive  with  Gaussian  elimi¬ 
nation  with  partial  pivoting,  which  does  not  destroy  the  band  structure  but 
ignores  symmetry  [BK77].  To  address  this,  the  authors  have  developed  a 
new  method  of  implementing  the  Bunch-Kaufman  algorithm  which  is  the 
method  of  choice  for  factoring  symmetric  indefinite  banded  systems  when 


the  systems  have  only  a  few  negative  eigenvalues  [JP89].  This  is  exactly  the 
case  which  arises  when  moving  the  shift  in  search  of  the  lowest  eigenvalues 
of  Equation  1  in  the  vibration  problem  and  is  often  the  case  in  the  buckling 
problem  as  well.  The  modified  algorithm  takes  full  advantage  of  the  sym¬ 
metry  of  the  system,  unlike  Gaussian  elimination,  and  is  therefore  faster  to 
execute  and  takes  less  storage  space.  LANZ  uses  this  algorithm  whenever 
the  system  can  be  indefinite.  As  an  additional  benefit,  the  inertia  of  the 
system  can  be  obtained  virtually  for  free  [BK77]. 

Regardless  of  which  factorization  method  is  used,  the  system  is  only  fac¬ 
tored  once  for  each  a.  After  the  factorization  has  taken  place,  each  tirne 
the  solution  to  Equation  68  is  required,  only  back  and  forward  triangular 
solutions  (and  a  diagonal  solution  in  the  Bunch-Kaufman  case)  must  be  ex¬ 
ecuted. 
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9  Performance  Analysis 

9.1  Vectorization 

From  the  analysis  in  Appendix  A  it  appears  that  vectorizing  LANZ  would 
result  in  significant  speedup  of  the  solution  procedure.  The  LANZ  code 
was  compiled  using  the  Convex  Vectorizing  Fortran  compiler  [Cor87].  The 
code  was  executed  using  double  precision  on  a  Convex  220  in  both  vector 
and  scalar  modes.  Seven  free  vibration  problems  and  five  buckling  problems 
of  varying  sizes  from  the  NASA  Langley  testbed  were  run.  The  problerns 
consisted  of  varying  sizes  of  three  different  structures.  The  first  structure  is  a 
thin  circular,  cylindrical  shell  simply  supported  along  its  edges.  The  buckling 
eigenvalues  for  this  structure  are  closely  spaced  and  present  a  challenge  for 
eigensolvers.  The  actual  finite  element  model  only  needs  to  model  a  small 
rectangle  of  the  cylinder  to  correctly  simulate  the  behavior  of  the  structure. 
A  plot  of  the  entire  cylinder  that  shows  the  15  degree  rectangle  of  the  cylinder 
that  is  modeled  is  given  Figure  20  of  Appendix  A.  The  two  lowest  buckling 
modes  for  an  axially-compressed  cylinder  are  are  also  plotted  in  Figure  21  of 
Appendix  A.  The  second  structure  is  a  composite  (graphite-epoxy)  blade- 
stiffened  panel  with  a  discontinuous  center  stiffener.  The  finite  element  model 
for  this  structure  is  shown  in  Figure  22  of  Appendix  A.  The  third  structure  is 
a  model  of  a  deployable  space  maist  constructed  at  NASA  Langley  Resecirch 
Center.  A  picture  of  the  deployable  mast  along  with  a  plot  of  the  finite 
element  model  is  shown  in  Figure  23  of  Appendix  A.  Descriptions  of  the  first 
two  structures  can  be  found  in  [Ste89a].  A  description  of  the  deployable  mast 
can  be  found  in  [HWHB86].  In  every  problem,  at  least  ten  eigenpairs  were 
found.  All  times  in  this  section  are  given  in  seconds.  In  each  problem,  n  will 
refer  to  the  number  of  equations,  and  /?  will  refer  to  the  semi-bandwidth  of 
the  K  matrix.  The  execution  times  for  the  vibration  problem  with  a  diagonal 
mass  matrix  are  given  in  Figure  10  where  a  speedup  due  to  vectorization  of 
up  to  7.83  is  shown.  Speedups  of  up  to  7.30  for  the  vibration  problem  with 
a  consistent  mass  matrix  are  given  in  Figure  11.  For  the  buckling  problem, 
speedups  of  up  to  7.79  can  be  observed  in  Figure  12.  These  speedups  are 
similar  to  the  speedups  obtained  by  other  linear  algebra  applications  on  the 
Convex  220.  From  these  comparisons  the  conclusion  can  be  drawn  that 
significant  speedup  of  the  solution  procedures  due  to  vectorization  can  be 
achieved. 


35 


Figure  10;  Vectorization  Results  for  the  Vibration  Problem  with  a  Diagonal 
Maas  Matrix 
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^  =  65 
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2.18 
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n  =  1824 

13  =  185 

4.69 

22.12 

4.72 

Cylinder 

n  =  7644 
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39.26 

263.73 

6.72 

Cylinder 

n  =  12054 
^  =  485 

82.88 

605.30 

7.30 

Panel 

n  =  477 
/3  =  142 

1.74 

5.18 

2.98 

Panel 

n  =  2193 
^  =  287 

10.61 

43.82 

4.13 

Figure  11:  Vectorization  Results  for  the  Vibration  Problem  with  a  Consistent 
Mass  Matrix 
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Mast 

n  =  1980 
/?  =  58 

5.16 

14.72 

2.85 
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II  II 

0.43 

1.01 
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Cylinder 

n  =  1824 
13=  185 

5.18 

27.54 

5.32 

Cylinder 

n  =  7644 
^  =  385 

70.32 

510.75 

7.26 

Cylinder 

n  =  12054 

13  =  485 

150.57 

1172.39 

7.79 

Figure  12;  Vectorization  Results  for  the  Buckling  Problem 


9.2  Comparison  With  Subspace  Iteration 

The  claim  was  made  in  Section  3  that  Lanczos’s  method  is  significantly  faster 
than  subspace  iteration.  The  results  presented  in  this  section  support  this 
claim.  The  LANZ  code  was  compared  with  the  E1G2  processor  from  the 
NASA  Langley  testbed  code.  The  E1G2  processor  uses  the  subspace  itera¬ 
tion  method  [Ste89b].  Both  codes  were  compiled  and  executed  as  in  Sub¬ 
section  9.1.  The  same  problems  that  were  solved  in  9.1  were  used  for  this 
comparison  in  which  the  the  lowest  ten  eigenvalues  were  sought.  Both  pro¬ 
grams  were  able  to  find  the  lowest  ten  eigenvalues  in  every  case,  although 
EIG2  took  an  unusually  large  number  of  iterations  (over  three  times  the  rec¬ 
ommended  maximum)  to  find  them  in  the  Mast  case  for  both  the  buckling 
and  free  vibration  problems.  The  Mast  problem  has  a  difficult  distribution 
of  eigenvalues,  and  the  LANZ  code  makes  use  of  shifting  to  quickly  find  the 
eigenvalues.  Both  codes  were  directed  to  find  the  eigenvalues  to  a  relative 
accuracy  of  10““*.  However,  the  subspace  iteration  code  used  an  accuracy 
measure  which  was  more  lax  than  that  used  in  the  LANZ  code.  The  mea¬ 
sure  used  in  the  subspace  code, 

(A‘+'  -  (69) 

w'here  k  is  the  iteration  number,  is  only  a  check  to  determine  whether  an 
eigenvalue  has  stabilized  relative  to  itself.  In  the  LANZ  code 

(II  /<yi  -  OiMy,  ||)/0,  (70) 

is  used  to  check  the  relative  accuracy  of  the  combination  of  the  eigenvalue 
and  the  eigenvector.  Therefore,  the  LANZ  code  is  at  a  disadvantage  to  the 
subspace  code  in  this  comparison  because  the  eigenpairs  are  computed  to 
greater  accuracy  than  in  the  subspace  iteration  code. 

When  the  results  comparing  the  two  codes  are  given,  two  times  are  re¬ 
ported  for  LANZ:  the  processing  time  required  by  the  code  and  the  total 
of  the  system  and  processing  time  required  by  the  code.  The  two  times  are 
given  because  the  EIG2  processor  can  only  report  its  execution  time  as  the 
total  of  system  and  processing  time.  For  the  free  vibration  problem  with 
a  diagonal  mass  matrix,  LANZ  is  shown  in  Figure  13  to  be  about  7  to  14 
times  faster  than  subspace  iteration.  In  Figure  14,  LANZ  is  shown  to  be 
about  7  to  26  times  faster  than  subspace  iteration  for  the  vibration  problem 
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3.73 

46.40 
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35.18 

313.80 

8.92 

Cylinder 
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76.65 

541.50 

7.06 
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mm 

1.04 

1.07 

12.30 

11.50 

Panel 

n  =  2193 

0  =  237 

6.04 

6.17 

82.30 

13.34 

Figure  13:  LANZ  vs.  Subspace  Iteration:  Vibration  Problem  with  Diagonal 
Mass  Matrix 
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n  =  477 
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LANZ 

Program 
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LANZ 

Total 
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Subspace  Iteration 
Total 
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4.05 

4.08 

107.60 

0.44 

0.44 

5.90 

4.69 

4.77 

51.60 

39.26 

39.74 

357.10 

82.88 

83.80 

585.10 

1.74 

1.77 

20.50 

10.61 

10.72 

109.80 

Figure  14:  LANZ  vs.  Subspace  Iteration:  Vibration  Problem  with  Consis¬ 
tent  Mass  Matrix 
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5.30 

92.80 

17.51 
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n  =  7644 
^  =  385 

70.32 

70.84 

523.90 

7.40 

Cylinder 

n  =  12054 
^  =  485 

150.57 

151.44 

992.30 

6.55 

Figure  15:  LANZ  vs.  Subspace  Iteration:  Buckling  Problem 

with  a  consistent  mass  matrix.  LANZ  is  shown  to  be  about  6  to  21  times 
faster  than  subspace  iteration  for  the  buckling  problem  in  Figure  15. 

LANZ’s  advantage  over  subspace  iteration  appears  to  be  diminishing  as 
the  problem  sizes  increase  because  the  factorization  of  the  matrices  takes  a 
larger  proportion  of  the  time  as  the  matrix  size  increases.  Because  each  code 
could  use  the  same  factorization  technique,  the  time  spent  in  factorization 
distorts  the  advantage  that  LANZ  holds  over  subspace  iteration.  To  more 
clearly  illustrate  the  advantage  of  LANZ  over  subspace  iteration,  the  time 
for  factorizing  {K  —  cM)  was  removed  from  the  results  in  Figures  13,  14, 
and  15.  Only  the  totals  of  system  and  processing  time  were  accessible  when 
computing  the  modified  times.  Although  the  time  for  triangular  linear  sys¬ 
tem  solutions  (the  backward,  forward,  and  diagonal  linear  solutions  required 
at  each  step)  is  still  included,  the  modified  times  will  give  the  reader  a  bet¬ 
ter  comparison  of  the  time  spent  in  the  eigensolving  routines.  In  Figure  16, 
LANZ  now  shows  an  advantage  of  up  to  47.18  for  the  vibration  problem  with 
a  diagonal  mass  matrix.  For  the  vibration  problem  with  a  consistent  mass 
matrix,  a  speedup  of  up  to  31,31  can  be  observed  in  Figure  17.  A  speedup 
for  the  buckling  problem  of  up  to  23.64  is  shown  in  Figure  18.  In  Figures  16 
and  17  the  LANZ  code  used  only  one  factorization  per  problem  except  for 
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12.65 

235.50 

18.62 

Cylinder 

n  =  12054 

0  =  485 

23.45 

362.00 

15.44 

Panel 

n  =  477 
0=142 

0.87 

11.50 

13.22 

Panel 

n  =  2193 

0  =  237 

3.63 

71.70 

19.75 

Figure  16:  Comparison  without  Factorization:  Vibration  Problem  with  a 
Diagonal  Mass  Matrix 

the  mast  problem  where  two  factorizations  were  required  for  ten  eigenval¬ 
ues  to  converge.  In  Figure  18  the  LANZ  code  used  only  one  factorization 
per  problem  to  converge  to  ten  eigenvalues  except  in  the  two  large  cylinder 
problems  and  the  mast  problem,  where  two  factorization  were  required. 

9.3  Performance  Benefits  of  Tracking  Eigenvalues 

The  value  of  tracking  the  eigenvalues  will  now  be  shown.  In  Section  7  an 
algorithm  for  tracking  and  computing  the  eigenvalues  of  Tj  is  given.  The  code 
was  run  on  a  Convex  C-1  computer  for  five  free  vibration  problems  from  the 
NASA  Langley  testbed.  Ten  eigenvalues  were  sought  for  each  problem.  To 
assess  the  benefits  of  the  tracking  algorithm,  the  code  was  run  with  the 
tracking  algorithm  first  turned  on  and  then  turned  off.  The  M  matrices  in 
this  experiment  are  diagonal;  however,  the  benefits  would  be  even  greater 
for  non-diagonal  M  matrices.  Reductions  in  execution  time  of  up  to  23 
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Figure  17:  Comparison  without  Factorization:  Vibration  Problem  with 
Consistent  Mass  Matrix 
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Figure  18:  Comparison  without  Factorization:  Buckling  Problem 


Figure  19;  Execution  time  with  and  without  tracking 


percent  are  shown  in  Figure  19.  The  data  in  Figure  19  are  from  a  version  of 
the  program  that  existed  prior  to  changes  made  in  early  1989.  The  current 
version  of  the  program  will  not  work  with  the  tracking  algorithm  turned  off. 
The  gain  in  execution  time  would  actueilly  be  more  mcirked  if  the  the  tracking 
algorithm  could  be  turned  off  in  the  current  version  because  other  parts  of 
the  LANZ  algorithm  that  aren’t  affected  by  the  tracking  algorithm  have 
been  optimized. 


9.4  Multiple  Eigenvalues 

Although  the  test  problems  from  NASA  Langley  had  no  low  eigenvalues 
of  multiplicity  greater  than  one,  some  of  the  eigenvalues  in  the  M2ist  case 
were  very  closely  clustered.  However,  the  performance  of  the  algorithm  with 
exact  multiple  eigenvalues  is  of  interest.  Therefore,  diagonal  test  matrices 
with  multiple  eigenvalues  were  constructed  to  test  whether  LANZ  would 
reveal  their  presence.  In  these  test  cases,  the  correct  number  of  copies  of 
each  eigenvalue  were  found  by  LANZ. 
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10  Concluding  Remarks 

10.1  Conclusions 

For  the  large,  generalized  eigenvalue  problem  arising  from  two  structural 
engineering  applications,  the  vibration  and  buckling  problems,  the  LANZ 
algorithm  was  shown  to  be  superior  to  the  subspace  iteration  method.  Re¬ 
sults  from  several  structural  engineering  problems  were  given  to  suppc.  l  iLId 
claim.  LANZ  is  based  on  the  Lanczos  algorithm  and  makes  use  of  spectral 
transformations,  dynamic  movement  of  a  shift,  and  a  modified  version  of 
selective  reorthogonalization  to  quickly  converge  to  desired  eigenpairs.  The 
dynamic  shift-moving  algorithm  used  by  LANZ  was  described.  The  shifting- 
moving  algorithm  is  based  on  a  cost  analysis  of  the  Lanczos  algorithm  with 
spectral  transformations  and  selective  reorthogonalizations.  A  parallel  algo¬ 
rithm  for  efficiently  solving  the  tridiagonal  matrices  that  arise  when  using 
Lanczos’s  method  was  also  given. 

10.2  Future  Work 

LANZ  has  been  shown  to  perform  well  on  vector  machines,  an  important 
class  of  scientific  computing  machines.  These  classes  show  the  most  promise 
for  solving  very  large  problems.  The  next  step  is  to  shown  that  LANZ  will 
perform  well  on  parallel  and  vector /parallel  computers.  An  examination  of 
the  LANZ  algorithm  based  on  the  analysis  in  Section  6  is  the  logical  first 
step  in  determining  a  strategy  for  parallelizing  LANZ.  A  possible  next  step 
is  to  use  the  Force  programming  language  to  parallelize  the  code  [Jor87]. 
This  language  allows  parallel  loops  to  be  easily  expressed  and  can  be  used 
on  several  different  shared-memory  computers.  The  Force  has  been  shown 
to  be  a  good  language  for  parallel  linear  algebra  applications  [JPV89].  The 
outlined  approach  would  most  likely  provide  a  good  barometer  with  which 
to  assess  the  performance  of  LANZ  on  parallel  machines. 
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A  Sequential  and  Vector  Cost  Analysis 

A  step-by-step  cost  analysis  for  the  Lanczos  algorithm  when  using  trans¬ 
formation  B  (shown  in  Figure  2)  is  given  below  for  sequential  and  vector 
machines. 

Definitions: 

The  semi-bandwidth  of  the  K  matrix. 

Hm-  The  semi-bandwidth  of  the  M  matrix, 
n:  The  number  of  equations  in  the  system. 

daxpy:  A  double  precision  vector  operation  that  computes  ax-fy, 
where  a  is  a  scalar  and  x  and  y  are  vectors. 


Initialization 

1. )  Choose  an  initial  guess,  guess 

SmaJl  cost  ((9(cn)),  but  might  be  larger  depending  on  the 
method  used  for  choosing  the  guess. 

2. )  rp  =  (K  —  crM)~^ M gtie<‘s  (Purifying  rp) 

A. )  Formation  of  the  matrix  {K  —  aM) 

The  matrix  is  formed  from  K  and  M  and  made  available 
to  the  factorization  routine. 

Sequential:  HMn  subtractions  and  multiplications 
Vector.  1  //jvrn-length  daxpy  operation 

B. )  Factorization  of  [K  —  crM) 

Using  Bunch-Kaufman  (or  LDL^  decomposition)  as  de¬ 
scribed  in  Section  8 
Sequential:  n  divisions 

Oijiyji)  multiplications 
0(n//^/2)  multiplications  and  additions 
Vector.  n  scalar  divisions 

n  pf(:-length  vector  by  scalar  multiplications 
nfifc  daxpy  operations  of  average  length 

C. )  Forward  Solve  using  factored  matrix  from  B 

Sequential:  0{nfi}i)  multiplications  and  divisions 
Vector.  n  p/^--length  daxpy  operations 

D. )  Diagonal  Solve  using  factored  matrix  from  B 

Sequential:  3n  multiplications 
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3.) 


4.) 


2n  additions 

Vector.  3  n-length  daxpy  operations 

E.)  Back  Solve  using  factored  matrix  from  B 

Sequential:  O(n^fe)  multiplications  and  divisions 
Vector.  n  /i^;-length  inner  products 

Pi  =  M  rp 

An  n  X  n  banded  matrix-vector  multiplication 
Sequential:  -f  l)n)  multiplications 

0(2pK^)  additions 

Vector.  n  fiie  4-  1-^ength  inner  products 
n  p/c-length  daxpy  operations 
A  =  (roPiY^^ 

An  n-length  inner  product  and  a  squcire  root 
Sequential:  n  multiplications 
n  —  1  additions 
1  square  root 

Vector.  1  n-length  inner  product 
1  square  root 


Program  Loop 

5. )  For  j  =  1,  maximum  number  of  iterations 

6. )  Reorthogonalizatiop  phase 

Orthogonalize  rj_i  and  qj^i  against  “good”  Ritz  vectors 
if  necessary  (see  section  on  orthogonalization  for  details). 
Steps  A  and  B  are  done  only  once  and  only  if  reorthog- 
onalization  is  needed.  Steps  C  and  E  are  done  for  each 
Ritz  vector  that  is  orthogonalized  against  rj_i.  Steps  D 
and  F  are  done  for  each  Ritz  vector  that  is  orthogonalized 
against  qj-i- 

A. )  U  = 

Same  cost  a.s  3 

B. )  <2  =  Mqj-i 

Same  cost  as  3 

C. )  ii  =  yJU 

Multiplication  of  an  n-length  vector  by  a  scalar 
Sequential:  n  multiplications 
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Vector.  1  n-length  vector  by  scalar  multiplication 

D. )  A  =  yli2 

Same  cost  as  C 

E. )  r_,_i  =  rj_i  -  7.y,- 

Orthogonalize  Vj  against  y,- 
Sequential:  n  multiplications 
n  additions 

Vector.  1  n-length  daxpy  operation 

F. )  qj.i  =  qj.i  -  ipiVi 

Same  cost  as  6E 
Orthogonalize  rj  against  y,- 

7. )  qj  = 

Division  of  an  n-length  vector  by  a  scalar 
Sequential:  n  multiplications 
1  division 

Vector.  1  n-length  vector  by  scalar  multiplication 
1  division 

8. )  Pj  =Pjf0j 

Same  cost  as  7 

9. )  (/<  -  crM)rj  =  Pj 

Same  cost  as  parts  C,  D,  and  E  of  3 

10. )  rj  =  ry  - 

Orthogonalize  Vj  against  qj^i 
Same  cost  as  6E 

11. )  Qj  =  rjpj 

Same  cost  as  4  except  no  square  root  is  needed 

12. )  rj  =  rj  -  q^Qj 

Same  cost  as  10 

13. )  pj+i  =  Mrj 

Same  cost  as  3 

14. )  I3j+i  =  {rJpj+iY^^ 

15. )  Compute  the  eigenvalue  of  Tj  and  the  corresponding  error  bounds 

A. )  Calculate  j  eigenvalues  via  Newton’s  method 

Sequential  and  Vector.  Variable,  but  very  small  (0(_7^)) 

B. )  Calculate  j  error  bounds,  Pji 

Sequential:  j  multiplications 
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16. )  Compute  any  converged  Ritz  Vectors 

Cost  per  Ritz  vector  of  computing  y,- 

A. )  Compute  correction  factor,  to,-  =  l/0,(TjS,l 

Sequential:  3j  multiplications 
2j  additions 
j  multiplications 
1  division 

Vector.  3  j-length  daxpy  operations 

1  j-length  vector  by  scalar  multiplication 
1  division 

B. )  Compute  y,  = 

Sequential:  nj  multiplications 
n(j  —  1)  additions 

Vector.  n  j-length  inner  products 

17. )  Halt  if  enough  eigenvalues  have  been  found 

18. )  End  of  Loop 
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Figure  20;  Axially-compressed  circular  cylindrical  shell 
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Figure  21: 


Lowest  two  buckling  modes  of  an  axially-compressed  cylinder 
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